Morphological Analysis of Inflective Languages through Generation

نویسندگان

  • Alexander F. Gelbukh
  • Grigori Sidorov
چکیده

A crucial problem in development of systems for automatic morphological analysis for inflective languages is the treatment of stem alternations. The existing models require development of the corresponding rules that specify what stems can be generated from a given one. Many of such rules (e.g., for Russian about a thousand) do not have any reasonable linguistic interpretation. We suggest a method that avoids the use of such rules by generating and verifying the hypotheses about possible grammatical forms. The methods of such type are known as analysis through generation; they make the system development much simpler than the standard direct approach. A morphological analysis and generation system for Russian developed with our method is freely available for academic use; a Spanish system is being implemented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort

Development of morphological analysis systems for inflective languages is a tedious and laborious task. We suggest an approach for development of such systems that permits to spend less time and effort. It is based on static processing of stem allomorphs and the method of analysis known as “analysis through generation.” These features allow for using the morphological models oriented to generat...

متن کامل

Morphemic Analysis: A Dictionary Lookup Instead of Real Analysis

This paper presents an approach for developing morphological and morphemic analysis systems for inflective languages based on a simple and fast dictionary lookup instead of any kind of analysis of the input word form. This approach allows the information about the word forms (lemma, tag, morpheme structure, derived words, derivational relations) to be described according to the traditional gram...

متن کامل

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition

We present two recently released opensource taggers: NameTag is a free software for named entity recognition (NER) which achieves state-of-the-art performance on Czech; MorphoDiTa (Morphological Dictionary and Tagger) performs morphological analysis (with lemmatization), morphological generation, tagging and tokenization with state-of-the-art results for Czech and a throughput around 10-200K wo...

متن کامل

Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora

The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill. In the present work, a technique has been developed that extracts parallel corpus between Manipuri, a morphologicall...

متن کامل

Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2002